Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation improvements #10965

Merged

Conversation

skrzypo987
Copy link
Member

Few neat tricks to speed up aggregation a bit

@sopel39
Copy link
Member

sopel39 commented Feb 7, 2022

fyi @martint

@sopel39
Copy link
Member

sopel39 commented Feb 7, 2022

@skrzypo987 could you share perf results?

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch from 8ea23c9 to 2cb6340 Compare February 8, 2022 09:04
@skrzypo987
Copy link
Member Author

@skrzypo987 could you share perf results?

I messed up some things so there is a regression. Will get back with the results once I fix it.

@skrzypo987
Copy link
Member Author

before:

BenchmarkGroupByHashOnTpch.groupBy                  false        0  avgt   20    9,134 ±  0,555  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        1  avgt   20   66,969 ±  2,017  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      2_0  avgt   20   23,235 ±  0,689  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      2_1  avgt   20   40,110 ±  0,420  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        3  avgt   20   55,257 ±  1,738  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      4_0  avgt   20   48,726 ±  1,106  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      4_1  avgt   20   81,607 ±  0,805  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        5  avgt   20   53,240 ±  1,124  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        7  avgt   20  123,469 ±  1,859  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        8  avgt   20    8,932 ±  0,136  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        9  avgt   20   60,057 ±  1,000  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     11_0  avgt   20   22,098 ±  0,386  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     11_1  avgt   20   12,275 ±  0,140  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       12  avgt   20   47,678 ±  1,636  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     13_0  avgt   20   27,317 ±  0,579  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     13_1  avgt   20   37,461 ±  0,424  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       15  avgt   20   22,595 ±  0,226  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     16_0  avgt   20  533,342 ±  4,900  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     16_1  avgt   20  257,857 ±  4,977  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       17  avgt   20   25,246 ±  0,836  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       18  avgt   20   82,138 ±  1,520  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       20  avgt   20   35,989 ±  1,407  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       21  avgt   20  385,947 ± 10,523  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       22  avgt   20   24,116 ±  0,744  ns/op

after

BenchmarkGroupByHashOnTpch.groupBy                  false        0  avgt   20    6,756 ±  0,049  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        1  avgt   20    2,521 ±  0,034  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      2_0  avgt   20   19,771 ±  0,672  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      2_1  avgt   20   39,528 ±  0,643  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        3  avgt   20   47,519 ±  1,084  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      4_0  avgt   20   48,953 ±  0,771  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false      4_1  avgt   20   69,582 ±  1,493  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        5  avgt   20   51,688 ±  1,756  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        7  avgt   20  127,517 ±  0,964  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        8  avgt   20    6,692 ±  0,035  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false        9  avgt   20   65,012 ±  1,214  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     11_0  avgt   20   17,419 ±  0,184  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     11_1  avgt   20   12,352 ±  0,160  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       12  avgt   20   48,045 ±  0,949  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     13_0  avgt   20   22,845 ±  0,593  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     13_1  avgt   20   35,829 ±  0,841  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       15  avgt   20   19,283 ±  0,421  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     16_0  avgt   20  487,442 ±  5,271  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false     16_1  avgt   20  212,259 ±  4,008  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       17  avgt   20   22,631 ±  0,952  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       18  avgt   20   70,313 ±  1,561  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       20  avgt   20   23,919 ±  1,027  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       21  avgt   20  362,150 ± 10,634  ns/op
BenchmarkGroupByHashOnTpch.groupBy                  false       22  avgt   20   21,731 ±  0,560  ns/op

@skrzypo987 skrzypo987 added the WIP label Feb 8, 2022
@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch 2 times, most recently from eb12136 to f5ae9a7 Compare February 9, 2022 08:51
@skrzypo987
Copy link
Member Author

I got rid of the biggest commit as it showed some regressions that were difficult to spot. I'll return to it in different PR.
The only benchmark change with the current state of the PR is tpch/q01 going from ~50ns/row to ~2.5ns/row

@skrzypo987 skrzypo987 removed the WIP label Feb 9, 2022
@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch from f5ae9a7 to 406a41b Compare February 9, 2022 15:12
combinationIds[i] = (short) blocks[0].getId(i);
}
for (int j = 1; j < channels.length; j++) {
for (int i = 0; i < positionCount; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why iterate over channels first? we then update combinations multiple times but we could compute it once per position

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we iterate over a single block at a time. Microbenchmarks show better results that way.

}
for (int j = 1; j < channels.length; j++) {
for (int i = 0; i < positionCount; i++) {
combinationIds[i] *= dictionarySizes[channels.length - j];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct that we take dict channels.length - j?
Im not sure if I understand correctly but e.g.

dict size 1, 2, 3
values 0, 1, 0
(((0 * 2) + 1 ) * 2) + 0 = 2

values 0, 0, 2
(((0 * 2) + 0 ) * 4) + 2 = 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. This is one of those "how could that even worked" kind of bug.
I fixed it and added some tests.

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch 3 times, most recently from 7589e02 to 25eca66 Compare February 21, 2022 09:39
@skrzypo987
Copy link
Member Author

Got rid of the commit introducting tpch aggregation benchmark as its newer version exists in #11031

@@ -219,6 +223,9 @@ public void appendValuesTo(int groupId, PageBuilder pageBuilder, int outputChann
if (isRunLengthEncoded(page)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should call page = page.getLoadedPage() before dictionary checks (same for getGroupIds). Same for BigintGroupByHash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the PR

@sopel39
Copy link
Member

sopel39 commented Feb 23, 2022

Is BenchmarkGroupByHashOnTpch part of OS? These benchmark results should be part of commit message

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch 2 times, most recently from 8d185a4 to ef28b26 Compare February 25, 2022 12:50
return false;
}
cardinality = multiplyExact(cardinality, ((DictionaryBlock) page.getBlock(channel)).getDictionary().getPositionCount());
if (cardinality > positionCount * SMALL_DICTIONARIES_MAX_CARDINALITY_RATIO || cardinality > SMALL_DICTIONARIES_MAX_CARDINALITY) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would remove cardinality > SMALL_DICTIONARIES_MAX_CARDINALITY and only leave ratio unless there is strong reason to keep this check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Made some benchmarks and there is indeed little chance for regression

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch 3 times, most recently from e9ed1ba to 1e161fa Compare March 1, 2022 07:29
@skrzypo987 skrzypo987 requested a review from sopel39 March 1, 2022 11:09
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

secondBlock = BlockAssertions.createLongDictionaryBlock(10, 100, 7);
page = new Page(firstBlock, secondBlock);

groupByHash.addPage(page).process();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch from 1e161fa to 59f8dfb Compare March 4, 2022 12:52
@skrzypo987 skrzypo987 requested a review from sopel39 March 4, 2022 12:53
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % commnent

@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch from 59f8dfb to f78d524 Compare March 7, 2022 07:05
If the number of combinations of all dictionaries in a page is below certain number,
we can store the results in a small array and reuse found groups
@skrzypo987 skrzypo987 force-pushed the skrzypo/053-aggregation-improvements branch from f78d524 to ffd1ee8 Compare March 7, 2022 11:10
@skrzypo987 skrzypo987 requested a review from sopel39 March 7, 2022 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants